Process for obtaining DNA, RNA, peptides, polypeptides, or protein, by recombinant DNA technique

ABSTRACT

The present invention is directed to a process for the production of a peptide, polypeptide, or protein having a predetermined property. In accordance with one embodiment, the process begins by producing by way of synthetic polynucleotide coupling, stochastically generated polynucleotide sequences. A library of expression vectors containing such stochastically generated polynucleotide sequences is formed. Next, host cells containing the vectors are cultured so as to produce peptides, polypeptides, or proteins encoded by the stochastically generated polynucleotide sequences. Screening or selection is carried out on such host cells to identify a peptide, polypeptide, or protein produced by the host cells which has the predetermined property. The stochastically generated polynucleotide sequence which encodes the identified peptide, polypeptide, or protein is then isolated and used to produce the peptide, polypeptide, or protein having the predetermined property.

CROSS REFERENCE TO RELATED APPLICATION

[0001] The present application is a continuation of U.S. patentapplication Ser. No. 942,630 filed Nov. 20, 1986, the entire disclosureof which is incorporated herein by reference.

[0002] The present invention has as its object a process for obtainingDNA, RNA, peptides, polypeptides, or proteins, through use oftransformed host cells containing genes capable of expressing theseRNAs, peptides, polypeptides, or proteins; that is to say, byutilization of recombinant DNA technique.

[0003] The invention aims in particular at the production of stochasticgenes or fragments of stochastic genes in a fashion to permit obtainingsimultaneously, after transcription and translation of these genes, avery large number (on the order of at least 10,000) of completely newproteins, in the presence of host cells (bacterial or eukaryotic)containing these genes respectively capable of expressing theseproteins, and to carry out thereafter a selection or screen among thesaid clones, in order to determine which of them produce proteins withdesired properties, for example, structural, enzymatic, catalytic,antigenic, pharmacologic, or properties of liganding, and moregenerally, chemical, biochemical, biological, etc. properties.

[0004] The invention also has as its aim procedures to obtain, sequencesof DNA or RNA with utilizable properties notably chemical, biochemical,or biological properties.

[0005] It is clear, therefore, that the invention is open to a verylarge number of applications in very many areas of science, industry,and medicine.

[0006] The process for production of peptides or polypeptides accordingto the invention is characterized in that one produces simultaneously,in the same medium, genes which are at least partially composed ofsynthetic stochastic polynucleotides, that one introduces the genes thusobtained into host cells, that one cultivates simultaneously theindependent clones of the transformed host cells containing these genesin such a manner so as to clone the stochastic genes and to obtain theproduction of the proteins expressed by each of these stochastic genes,that one carries out selection and/or screening of the clones oftransformed host cells in a manner to identify those clones producingpeptides or polypeptides having at least one desired activity, that onethereafter isolates the clones thus identified and that one cultivatesthem to produce at least one peptide or polypeptide having the saidproperty.

[0007] In a first mode of carrying out this process, stochastic genesare produced by stochastic copolymerization of the four kinds ofdeoxyphosphonucleotides, A, C, G and T from the two ends of an initiallylinearized expression vector, followed by formation of cohesive ends insuch a fashion as to form a stochastic first strand of DNA constitutedby a molecule of expression vector possessing two stochastic sequenceswhose 3′ ends are complementary, followed by the synthesis of the secondstrand of the stochastic DNA.

[0008] In a second mode of carrying out this process, stochastic genesare produced by copolymerization of oligonucleotides without cohesiveends, in a manner to form fragments of stochastic DNA, followed byligation of these fragments to a previously linearized expressionvector.

[0009] The expression vector can be a plasmid, notably a bacterialplasmid. Excellent results have been obtained using the plasmid pUC8 asthe expression vector.

[0010] The expression vector can also be viral DNA or a hybrid ofplasmid and viral DNA.

[0011] The host cells can be prokaryotic cells such as HB 101 and C 600,or eukaryotic cells.

[0012] When utilizing the procedure according to the second modementioned above, it is possible to utilize oligonucleotides which for agroup of palindromic octamers.

[0013] Particularly good results are obtained by utilizing the followinggroup of palindromic octamers: 5′ GGAATTCC 3′ 5′ GGTCGACC 3′ 5′ CAAGCTTG3′ 5′ CCATATGG 3′ 5′ CATCGATG 3′

[0014] It is also possible to use oligonucleotides which form a group ofpalindromic heptamers.

[0015] Very good results are obtained utilizing the following group ofpalindromic heptamers: 5′ XTCGCGA 3′ 5′ XCTGCAG 3′ 5′ RGGTACC 3′

[0016] where X=A, G, C, or T, and R=A or T

[0017] According to a method to utilize these procedures which isparticularly advantageous, one isolates and purifies the transformingDNA of the plasmids from a culture of independent clones of thetransformed host cells obtained by following the procedures above, thenthe purified DNA is cut by at least one restriction enzyme correspondingto specific enzymatic cutting site present in the palindromic octamersor heptamers but absent from the expression vector which was utilized;this cutting is followed by inactivation of the restriction enzyme, thenone simultaneously treats the ensemble of linearized stochastic DNAfragments thus obtained with T4 DNA ligase, in such a manner to create anew ensemble of DNA containing new stochastic sequences, this newensemble can therefore contain a number of stochastic genes larger thanthe number of genes in the initial ensemble. One then utilizes this newensemble of transforming DNA to transform the host cells and clone thesegenes, and finally utilizes screening and/or selection and isolates thenew clones of transformed host cells and finally these are cultivated toproduce at least one peptide or polypeptide, for example, a new protein.

[0018] The property serving as the criterion for selection of the clonesof host cells can be the capacity of the peptides or polypeptides,produced by a given clone, to catalyze a given chemical reaction.

[0019] For instance, for the production of several peptides and/orpolypeptides, the said property can be the capacity to catalyze asequence of reactions leading from an initial group of chemicalcompounds to at least one target compound.

[0020] With the aim of producing an ensemble constituted by a pluralityof peptides and polypeptides which are reflexively autocatalytic, thesaid property can be the capacity to catalyze the synthesis of the sameensemble from amino acids and/or oligopeptides in an appropriate milieu.

[0021] The said property can also be the capacity to modify selectivelythe biological or chemical properties of a given compound, for example,the capacity to selectively modify the catalytic activity of apolypeptide.

[0022] The said property can also be the capacity to stimulate, inhibit,or modify at least one biological function of at least one biologicallyactive compound, chosen, for example, among the hormones,neurotransmitters, adhesion factors, growth factors, and specificregulators of DNA replication and/or transcription and/or translation ofRNA.

[0023] The said property can equally be the capacity of the peptide orpolypeptide to bind to a given ligand.

[0024] The invention also has as its object the use of the peptide orpolypeptide obtained by the process specified above, for the detectionand/or the titration of a liquid.

[0025] According to a particularly advantageous mode of carrying out theinvention, the criterion for selection of the clones of transformed hostcells is the capacity of these peptides or polypeptides to simulate ormodify the effects of a biologically active molecule, for example, aprotein, and screening and/or selection for clones of transformed hostcells producing at least one peptide or polypeptide having thisproperty, is carried out by preparing antibodies against the activemolecule, then utilizing these antibodies after their purification, toidentify the clones containing this peptide or polypeptide, then bycultivating the clones thus identified, separating and purifying thepeptide or polypeptide produced by these clones, and finally bysubmitting the peptide or polypeptide to an in vitro assay to verifythat it has the capacity to simulate or modify the effects of the saidmolecule.

[0026] According to another mode of carrying out the process accordingto the invention, the property serving as the criterion of selection isthat of having at least one epitope similar to one of the epitopes of agiven antigen.

[0027] The invention carries over to obtaining polypeptides by theprocess specified above and utilizable as chemotherapeutically activesubstances.

[0028] In particular, in the case where the said antigen is EGF, theinvention permits obtaining polypeptides usable for chemotherapeutictreatment of epitheliomas.

[0029] According to a variant of the procedure, one identifies andisolates the clones of transformed host cells producing peptides orpolypeptides having the property desired, by affinity chromatographyagainst antibodies corresponding to a protein expressed by the naturalpan of the DNA hybrid.

[0030] For example, in the case where the natural part of the hybrid DNAcontains a gene expressing B-galactosidase, one can advantageouslyidentify and isolates the said clones of transformed host cells byaffinity chromatography against anti-B-galactosidase antibodies. Afterexpression and purification of hybrid peptides or polypeptides, one canseparate and isolate their novel parts.

[0031] The invention also applies to a use of the process specifiedabove for the preparation of a vaccine; the application is characterizedby the fact that antibodies against the pathogenic agent are isolated,for example, antibodies formed after injection of the pathogenic agentin the body of an animal capable of forming antibodies against thisagent, and these antibodies are used to identify the clones producing atleast one protein having at least one epitope similar to one of theepitopes of the pathogenic agent, the transformed host cellcorresponding to these clones are cultured to produce these proteins,this protein is isolated and purified from the clones of cells, thenthis protein is used for the production of a vaccine against thepathogenic agent.

[0032] For example in order to prepare an anti-HVB vaccine, one canextract and purify at least one capsid protein of the HVB virus, injectthis protein into an animal capable of forming antibodies against thisprotein having at least one epitope similar to one of the epitopes ofthe HVB virus, then cultivate the clones of transformed host cellscorresponding to these clones in a manner to produce this protein,isolate and purify the protein from culture of these clones of cells andutilize the protein for the production of an anti-HVB vaccine.

[0033] According to an advantageous mode of carrying out the processaccording to the invention, the host cells consist in bacteria such asEscherichia coli whose genome contains neither the natural geneexpressing β-galactosidase, nor the EBG gene, that is to say, Z⁻, EBG−E. coli. The transformed cells are cultured in the presence of X gal andthe indicator IPTG in the medium, and cells positive for β-galactosidasefunctions are detected; thereafter, the transforming DNA is transplantedinto an appropriate clone of host cells for large scale culture toproduce at least one peptide or polypeptide.

[0034] The property serving as the criterion for selection of thetransformed host cells can also be the capacity of the polypeptides orproteins produced by the culture of these clones to bind to a givencompound.

[0035] This compound can be in particular chosen advantageously amongpeptides, polypeptides, and proteins, notably among proteins regulatingthe transcription activity of DNA.

[0036] On the other hand, the said compound can also be chosen among DNAand RNA sequences.

[0037] The invention has also as its object those proteins which areobtained in the case where the property serving as criterion ofselection of the clones of transformed host cells consist in thecapacity of these proteins to bind to regulatory proteins controllingtranscription activity of the DNA, or else to DNA and RNA sequences.

[0038] The invention has, in addition, as an object, the use of aprotein which is obtained in the first particular case above mentioned,as a cis-regulatory sequence controlling replication or transcription ofa neighboring DNA sequence.

[0039] On the other hand, the aim of the invention also includesutilization of proteins obtained in the second case mentioned to modifythe properties of transcription or replication of a sequence of DNA, ina cell containing the sequence of DNA, and expressing this protein.

[0040] The invention has as its object as well as a process ofproduction of DNA, characterized by simultaneous production in the samemedium, of genes at least partially composed of stochastic syntheticpolynucleotides, in that the genes thus obtained are introduced intohost cells to produce an ensemble of transformed host cells, in thatscreening and/or selection on this ensemble is carried out to identifythose host cells containing in their genome stochastic sequences of DNAhaving at least one desired property, and finally, in that the DNA fromthe clones of host cells thus identified is isolated.

[0041] The invention further has as its object a procedure to produceRNA, characterized by simultaneous production in the same medium, ofgenes at least partially composed of stochastic syntheticpolynucleotides, in that the genes thus obtained are introduced intohost cells to produce an ensemble of transformed host cells, in that thehost cells so produced are cultivated simultaneously, and screeningand/or selection of this ensemble is carried out in a manner to identifythose host cells containing stochastic sequences of RNA having at leastone desired property, and in that the RNA is isolated from the hostcells thus identified. The said property can be the capacity to bind agiven compound, which might be, for example, a peptide or polypeptide orprotein, or also the capacity to catalyze a given chemical reaction, orthe capacity to be a transfer RNA.

[0042] Now the process according to the invention will be described inmore details, as well as some of its applications, with reference tonon-limitative embodiments.

[0043] First, we shall describe particularly useful procedures to carryout the synthesis of stochastic genes, and the introduction of thosegenes in bacteria to produce clones of transformed bacteria.

[0044] I) Direct Synthesis on an Expression Vector.

[0045] a) Linearization of the Vector

[0046] 30 micrograms, that is, approximately 10¹³ molecules of the pUC8expression vector are linearized by incubation for 2 hours at 37° C.with 100 units of the Pst1 restriction enzyme in a volume of 300 μl ofthe appropriate standard buffer. The linearized vector is treated withphenol-chloroform then precipitated in ethanol, taken up in volume of 30μl and loaded onto a 0.8% agarose gel in standard TEB buffer. Aftermigration in a field of 3V/cm for three hours, the linearized vector iselectro-eluted, precipitated in ethanol, and taken up in 30 μl of water.

[0047] b) Stochastic Synthesis Using the Enzyme Terminal Transferase(TdT)

[0048] 30 ug of the linearized vector are reacted for 2 hours at 37° C.with 30 units of TdT in 300 μl of the appropriate buffer, in thepresence of 1 mM dGTP, 1 mM dCTP, 0.3 mM dTTP and 1 mM dATP. The lowerconcentration of dTTP is chosen in order to reduce the frequence of“stop” codons in the corresponding messenger RNA. A similar result,although somewhat less favorable, can be obtained by utilizing a lowerconcentration for dATP than for the other deoxynucleotide triphosphates.The progress of the polymerization on the 3′ extremity of the Pst1 sitesis followed by analysis on a gel of aliquots taken during the course ofthe reaction.

[0049] When the reaction attains or passes a mean value of 300nucleotides added per 3′ extremity, it is stopped and the freenucleotides are separated from the polymer by differential precipitationor by passage over a column containing a molecular sieve such as BiogelP60. After concentration by precipitation in ethanol, the polymers aresubjected to a further polymerization with TdT, first in the presence ofdATP, then in the presence of dTTP. These last two reactions areseparated by a filtration on a gel and are carried out for shortintervals (30 seconds to 3 minutes) in order to add sequentially 10-30 Afollowed by 10-30 T to the 3′ ends of the polymers.

[0050] c) Synthesis of the Second Strand of Stochastic DNA

[0051] Each molecule of vector possess, at the end of the precedingoperation, two stochastic sequences whose 3′ ends are complementary. Themixture of polymers is therefore incubated in conditions favoringhybridization of the complementary extremities (150 mM NaCl, 10 mMTris-HCl, pH 7.6, 1 mM EDTA at 650 for 10 minutes, followed by loweringthe temperature to 22° C. at a rate of 3 to 4° C. per hour). Thehybridized polymers are then reacted with 60 units of the large fragment(Klenow) of polymerase 1, in the presence of the four nucleotidetriphosphates (200 mM) at 4° C. for two hours. This step accomplishesthe synthesis of the second strand from the 3′ ends of the hybridpolymers. The molecules which result from this direct synthesis startingfrom linearized vector are thereafter utilized transform competentcells.

[0052] d) Transformation of Competent Clones

[0053] 100 to 200 ml of competent HB 101 of C 600 at a concentration of10¹⁰ cells/ml, are incubated with the stochastic DNA preparation (fromabove) in the presence of 6 mM CaCl₂, 6 mM Tris-HCl pHG, 6 mM MgCl₂ for30 minutes at 0° C. A temperature shock of 3 minutes at 37° C. isimposed on the mixture, followed by the addition of 400 to 800 ml of NZYculture medium, without antibiotics. The transformed culture isincubated at 37° C. for 60 minutes, then diluted to 10 litres byaddition of NZY medium containing 40 μg/ml of ampicillin. After 3-5hours of incubation at 37° C., the amplified culture is centrifuged, andthe pellet of transformed cells is lyophilysed and stored at −70° C.Such a culture contains 3×10⁷ to 10⁸ independent transformants, eachcontaining a unique stochastic gene inserted into the expression vector.

[0054] II) Synthetic of Stochastic Genes Starting from OligonucleotidesWithout Cohesive Ends.

[0055] This procedure is based on the fact that polymerization ofjudiciously chosen palindromic oligonucleotides permits construction ofstochastic genes which have no “stop” codon in any of the six possiblereading frames, while at the same time assuring a balancedrepresentation of triplets specifying all amino acids. Further, and toavoid a repetition of sequence motifs in the proteins which result, theoligonucleotides can contain a number of bases which is not a multipleof three. The example which follows describes the use of one of thepossible combinations which fulfil these criteria:

[0056] a) Choice of a Group of Octamers

[0057] The group of oligonucleotides following: 5′ GGAATTCC 3′5′ GGTCGACC 3′ 5′ CAAGCTTG 3′ 5′ CCATATGG 3′ 5′ CATCGATG 3′

[0058] is composed of 5 palindromes (thus self-complementary sequences)where it is easy to verify that their stochastic polymerization does notgenerate any “stop” codons, and specifies all the amino acids.

[0059] Obviously, one can utilize other groups of palindromic octamerswhich do not generate any “stop” codons and specify all the amino acidsfound in polypeptides. Clearly, it is also possible to utilizenon-palindromic groups of octamers, or other oligomers, under thecondition that their complements forming double stranded DNA are alsoused.

[0060] b) Assembly of a Stochastic Gene from a Group of Octamers.

[0061] A mixture containing 5 μg each of the oligonucleotides indicatedabove (previously phosphorylated at the 5′ position by a standardprocedure) is reacted in a 100 ul volume containing 1 mM ATP, 10%polyethyleneglycol, and 100 units of T4 DNA ligase in the appropriatebuffer at 13° C. for six hours. This step carries out the stochasticpolymerization of the oligomers in the double stranded state and withoutcohesive ends. The resulting polymers are isolated by passage over amolecular sieve (Biogel P60) recovering those with 20 to 100 oligomers.After concentration, this fraction is again submitted to catalysis orpolymerization by T4 DNA ligase under the conditions described above.Thereafter, as described above, those polymers which have assembled atleast 100 oligomers are isolated.

[0062] c) Preparation of the Host Plasmid

[0063] The pUC8 expression vector is linearized by Sma1 enzyme in theappropriate buffer, as described above. The vector linearized by Sma1does not have cohesive ends. Thus the linearized vector is treated bycalf intestine alkaline phosphatase (CIP) at a level of one unit permicrogram of vector in the appropriate buffer, at 37° C. for 30 minutes.The CIP enzyme is thereafter inactivated by two successive extractionswith phenol-chloroform. The linearized and dephosphorylated vector isprecipitated in ethanol, then redissolved in water at 1 mg/ml.

[0064] d) Ligation of Stochastic Genes to the Vector

[0065] Equimolar quantities of vector and polymers are mixed andincubated in the presence of 1000 units of T4 DNA ligase, 1 mM ATP, 10%polyethylene glycol, in the appropriate buffer, for 12 hours at 13° C.This step ligates the stochastic polymers in the expression vector andforms double stranded circular molecules which are, therefore, capableof transforming.

[0066] Transformation of Competent Clones.

[0067] Transformation of competent clones in carried out in the mannerpreviously described.

[0068] III) Assembly of Stochastic Genes Starting from a Group ofHeptamers.

[0069] This procedure differs from that just discussed in that itutilizes palindromic heptamers which have variable cohesive ends, inplace of stochastic sequences containing a smaller number of identicalmotifs.

[0070] a) Choice of a Group of Heptamers

[0071] It is possible, as an example, to use the following threepalindromic heptamers: 5′ XTCGCGA 3′ 5′ XCTGCAG 3′ 5″ RGGTACC 3′

[0072] where X=A, G, C, or T and R=A or T, and where polymerizationcannot generate any “stop” codons and forms triplets specifying all theamino acids. Clearly, it is possible to use other groups of heptamersfulfilling these same conditions.

[0073] b) Polymerization of a Group of Heptamers

[0074] This polymerization is carried out exactly in the fashiondescribed above for octamers.

[0075] c) Elimination of Cohesive Extremities

[0076] The polymers thus obtained have one unpaired base on their two 5′extremities. Thus, it is necessary to add the complementary base to thecorresponding 3′ extremities. This is carried out as follows: 10micrograms of the double stranded polymers are reacted with 10 units ofthe Klenow enzyme, in the presence of the four deoxynucleotidephosphates(200 mM) in a volume of 100 μl, at 4° C., for 60 minutes. The enzyme isinactivated by phenol chloroform extraction, and the polymers arecleansed of the residual free nucleotides by differential precipitation.The polymers are then ligated to the host plasmid (previously linearizedand dephosphorylated) by following the procedures described above.

[0077] It is to be noted that the two last procedures which weredescribed utilize palindromic octamers or heptamers which constitutespecific sites of certain restriction enzymes. These sites are absent,for the most part, from the pUC8 expression vector. Thus, it is possibleto augment considerably the complexity of an initial preparation ofstochastic genes by proceeding in the following way: the plasmid DNAderived from the culture of 10⁷ independent transformants obtained byone of the two last procedures described above, is isolated. After thisDNA is purified, it is partially digested by Cla1 restriction enzyme(procedure II) or by the Pst1 restriction enzyme (procedure III). Afterinactivation of the enzyme, partially digested DNA is treated with T4DNA ligase, which has the effect of creating a very large number of newsequences, while conserving the fundamental properties of the initialsequences. This new ensemble of stochastic sequences can then be used totransform competent cells. In addition, the stochastic genes cloned byprocedure II and III can be excised intact from the pUC8 expressionvector by utilizing restriction sites belonging to the cloning vectorand not represented in the stochastic DNA sequences.

[0078] Recombination within the stochastic genes generated by the twoprocedures just described, which result from the internal homology dueto the recurrent molecular motifs, is an important additional method toachieve in vivo mutagenesis of the coding sequences. This results in anaugmentation of the number of the new genes which can be examined.

[0079] Finally, for all the procedures to generate novel syntheticgenes, it is possible to use a number of common techniques to modifygenes in vivo or in vitro, such as a change of reading frame, inversionof sequences with respect to their promotor, point mutations, orutilization of host cells expressing one or several suppressor tRNAs.

[0080] In considering the above description, it is clear that it ispossible to construct, in vitro, an extremely large number (for example,greater than a billion) different genes, by enzymatic polymerization ofnucleotides or of oligonucleotides. This polymerization is carried outin a stochastic manner, as determined by the respective concentrationsof the nucleotides or oligonucleotides present in the reaction mixture.

[0081] As indicated above, two methods can be utilized to clone suchgenes (or coding sequences): the polymerization can be carried outdirectly on a cloning expression vector, which was previouslylinearized; or it is possible to proceed sequentially to thepolymerization then the ligation of the polymers to the expressionvector.

[0082] In the two cases, the next step is transformation or transfectionof competent bacterial cells (or cells in culture). This stepconstitutes cloning the stochastic genes in living cells where they areindefinitely propagated and expressed.

[0083] Clearly, in addition to the procedures which were describedabove, it is feasible to use all other methods which are appropriate forthe synthesis of stochastic sequences. In particular, it is possible tocarry out polymerization, by biochemical means, of single strandedoligomers of DNA or RNA obtained by chemical synthesis, then treat thesesegments of DNA or RNA by established procedures to generate doublestranded DNA (cDNA) in order to clone such genes.

[0084] Screening or Selection of Clones of Transformed Host Cells

[0085] The further step of the procedure according to the inventionconsists in examining the transformed or transfected cells by selectionor screening, in order to isolate one or several cells whosetransforming or transfecting DNA leads to the synthesis of atranscription product (RNA) or translation product (protein) havingdesired property. These properties can be, for example, enzymatic,functional, or structural.

[0086] One of the most important aspects of the process, according tothe invention, is that it permits the simultaneous screening orselection of an exploitable product (RNA or protein) and the gene whichproduces that product. In addition, the DNA synthesized and cloned asdescribed, can be selected or screened in order to isolate sequences ofDNA constituting products in themselves, having exploitable biochemicalproperties.

[0087] We shall now describe, as non-limiting examples, preferredprocedures for screening and/or selection of clones of transformed cellssuch that the novel proteins are of interest from the point of view ofindustrial or medical applications.

[0088] One of these procedures rests in the idea of producing, orobtaining, polyclonal or monoclonal antibodies, by establishedtechniques, directed against a protein or another type of molecule ofbiochemical or medical interest, where that molecule is, or has beenrendered, immunogenic, and thereafter using these antibodies as probesto identify among the very large number of clones transformed bystochastic genes, those whose protein react with these antibodies. Thisreaction is a result of a structural homology which exists between thepolypeptide synthesized by the stochastic gene and the initial molecule.It is possible in this way to isolate numbers of novel proteins whichbehave as epitopes or antigenic determinants on the initial molecule.Such novel proteins are liable to simulate, stimulate, modulate, orblock the effect of the initial molecule. It will be clear that thismeans of selection or screening may itself have very many pharmacologicand biochemical applications. Below we describe, as a non-limitingexample, this first mode of operation in a concrete case:

[0089] EGF (epidermal growth factor) is a small protein present in theblood, whose role is to stimulate the growth of epithelial cells. Thiseffect is obtained by the interaction of EGF with a specific receptorsituated in the membrane of epithelial cells. Antibodies directedagainst EGF are prepared by injecting animals with EGF coupled to KLH(keyhole limpet hemocyanin) to augment the immunogenicity of the EGF.The anti-EGF antibodies of the immunized animals are purified, forexample, by passage over an affinity column, where the ligand is EGF ora synthetic peptide corresponding to a fragment of EGF. The purifiedanti-EGF antibodies are then used as probes to screen a large number ofbacterial clones lysed by chloroform, and on a solid support. Theanti-EGF antibodies bind those stochastic peptides or proteins whoseepitopes resemble those of the initial antigen. The clones containingsuch peptides or proteins are shown by autoradiography after incubationof the solid support with radioactive protein A, or after incubationwith a radioactive antibody.

[0090] These steps identify those clones, each of which contains oneprotein (and its gene) reacting with the screening antibody. It isfeasible to screen among a very large number of colonies of bacterialcells or viral plaques (for example, on the order of 1,000,000) and itis feasible to detect extremely small quantities, on the order of 1nanogram, of protein product. Thereafter, the identified clones arecultured and the proteins so detected are purified in conventional ways.These proteins are tested in vitro in cultures of epithelial cells todetermine if they inhibit, simulate, or modulate the effects of EGF onthese cultures. Among the proteins so obtained, some may be utilized forthe chemotherapeutic treatment of epitheliomas. The activities of theproteins thus obtained can be improved by mutation of the DNA coding forthe proteins, in ways analogous to those described above. A variant ofthis procedure consists in purifying these stochastic peptides,polypeptides, or proteins, which can be used as vaccines or moregenerally, to confer an immunity against a pathogenic agent or toexercise other effects on the immunological system, for example, tocreate a tolerance or diminish hypersensitivity with respect to a givenantigen, in particular, due to binding of these peptides, polypeptides,or proteins with the antibodies directed against this antigen. It isclear that it is possible to use such peptides, polypeptides, orproteins in vitro as well as in vivo.

[0091] More precisely, in the ensemble of novel proteins which reactwith the antibodies against a given antigen X, each has at least oneepitope in common with X, thus the ensemble has an ensemble of epitopesin common with X. This permits utilization of the ensemble orsub-ensemble as a vaccine to confer immunity against X. It is, forexample, easy to purify one or several of the capsid proteins of thehepatitis B virus. These proteins can then be injected into an animal,for example, a rabbit, and the antibodies corresponding to the initialantigen can be recovered by affinity column purification. Theseantibodies may be used, as described above, to identify clones producingat least one protein having an epitope resembling at least one of theepitopes of the initial antigen. After purification, these proteins areused as antigens (either alone or in combination) with the aim ofconferring protection against hepatitis B. The final production of thevaccine does not require further access to the initial pathogenic agent.

[0092] Note that, during the description of the procedures above, anumber of means to achieve selection or screening have been described.All these procedures may require the purification of a particularprotein from a transformed clone. These protein purifications can becarried out by established procedures and utilize, in particular, thetechniques of gel chromatography, by ion exchange, and by affinitychromatography. In addition, the proteins generated by the stochasticgenes can be cloned in the form of hybrid proteins having, for example,a sequence of the β-galactosidase enzyme which permits affinitychromatography against anti-β-galactosidase antibodies, and allows thesubsequent cleavage of the hybrid part; that is to say, allowingseparation of the novel part and the bacterial part of the hybridprotein. Below we describe the principles and procedures for selectionof peptides or polypeptides and the corresponding genes, according to asecond method of screening or selection based on the detection of thecapacity of these peptides or polypeptides to catalyse a specificreaction.

[0093] As a concrete and non-limiting example, screening or selection inthe particular case of proteins capable of catalyzing the cleavage oflactose, normally a function fulfilled by enzyme β-galactosidase (β-gal)will be described.

[0094] As above described, the first step of the process consists ingenerating a very large ensemble of expression vectors, each expressinga distinct novel protein. To be concrete, for example, one may choosethe pUC8 expression vector with cloning of stochastic sequences of DNAin the Pst1 restriction site. The plasmids thus obtained are thenintroduced into a clone of E. coli from whose genome the natural genefor β-galactosidase, Z, and a second gene EBG, unrelated to the firstbut able to mutate towards β-gal function, have both been eliminated byknown genetic methods. Such host cells (Z⁻, EBG⁻) are not able bythemselves to catalyse lactose hydrolysis, and as a consequence, to uselactose as a carbon source for growth. This permits utilization of suchhost clones for screening or selection for β-gal function.

[0095] A convenient biological assay to analyze transformed E. coliclones for those which have novel genes expressing a β-gal functionconsists in the culture of bacteria transformed as described in petridishes containing X-gal in the medium. In this case, all bacterialcolonies expressing a β-gal function are visualized as blue colonies. Byusing such a biological assay, it is possible to detect even weakcatalytic activity. The specific activity of characteristic enzymesranges from 10 to 10,000 product molecules per second.

[0096] Supposing that a protein synthesized by a stochastic gene has aweak specific activity, on the order of one molecule per 100 seconds, itremains possible to detect such catalytic activity. In a petri dishcontaining X-gal in the medium, and the presence of thenon-metabolizable inducer IPTG (isopropyl-D-thiogalactoside)visualization of a blue region requires cleavage of about 10¹⁰ to 10¹¹molecules of X-gal per square millimeter. A bacterial colony expressinga weak enzyme and occupying a surface area of 1 mm² has about 10⁷ to 10⁸cells. If each cell has only one copy of the weak enzyme, each cellwould need to catalyse cleavage of between 10,000 and 100 of X-gal to bedetected, which would require between 2.7 and 270 hours. Since underselective conditions it is possible to amplify the number of copies ofeach plasmid per cell from 5 to 20 copies per cell, or even to 100 or1000, and because up to 10% of the protein of the cell can be specifiedby the new gene, the duration needed to detect a blue colony in the caseof 100 enzyme molecules of weak activity per cell is on the order of0.27 to 2.7 hours.

[0097] As a consequence of these facts, screening a very large number ofindependent bacterial colonies, each expressing a different novel gene,and using the capacity to express a β-gal function as the selectioncriterion, is fully feasible. It is possible to carry out screening ofabout 2000 colonies in one Petri dish of 10 cm diameter. Thus, about 20million colonies can be screened on a sheet of X-gal agar of 1 squaremeter.

[0098] It is to be noted that bacterial colonies which appear blue onX-gal Petri dishes might be false positives due to a mutation in thebacterial genome which confers upon it the capacity to metabolizelactose, or for other reasons than those which result from a catalyticactivity of the novel protein expressed by the cells of the colony. Suchfalse positives can be directly eliminated by purifying the DNA of theexpression vector from the positive colony, and retransforming Z⁻, EBG⁻E. coli host cells. If the β-gal activity is due to the novel proteincoded by the new gene in the expression vector, all those cellstransformed by that vector will exhibit β-gal function. In contrast, ifthe initial blue colony is due to a mutation in the genome of the hostcell, it is a rare event and independent of the transformation, thus thenumber of cells of the new clone of the transformed E. coli capable ofexpressing gal function will be small or zero.

[0099] The power of mass simultaneous purification of all the expressionvectors from all the positive clones (blue) followed by retransformationof naive bacteria should be stressed. Suppose that the aim is to carryout a screening to select proteins having a catalytic function, and thatthe probability that a new peptide or polypeptide carries out thisfunction at least weakly is 10⁻⁶, while the probability that a clone ofthe E. coli bacterial host undergoes a mutation rendering it capable ofcarrying out the same function is 10⁻⁵, then it can be calculated thatamong 20 million transformed bacteria which are screened, 20 positiveclones will be attributable to the novel genes in expression vectorswhich each carries, while 200 positive clones will be the result ofgenomic mutation. Mass purification of the expression vectors from thetotal of 220 positive bacterial clones followed by retransformation ofthe naive bacteria with the mixture of these expression vectors willproduce a large number of positive clones consisting of all thosebacteria transformed with the 20 expression vectors which code for thenovel proteins having the desired function, and a very small number ofbacterial clones resulting from genomic mutations and containing the 200expression vectors which are not of interest. A small number of cyclesof purification of expression vectors from positive bacterial colonies,followed by such retransformation, allows the detection of very rareexpression vectors truly positive for a desired catalytic activity,despite a high background rate of mutations in the host cells for thesame function.

[0100] Following screening operations of this type, it is possible topurify the new protein by established techniques. The production of thatprotein in large quantity is made possible by the fact thatidentification of the useful protein occurs together with simultaneousidentification of the gene coding for the same protein. Consequently,either the same expression vector can be used, or the novel gene can betransplanted into a more appropriated expression vector for itssynthesis and isolation in large quantity.

[0101] It is feasible to apply this method of screening for anyenzymatic function for which an appropriate biological assay exists. Forsuch screenings, it is not necessary that the enzymatic function whichis sought be useful to the host cell. It is possible to carry outscreenings not only for an enzymatic function but for any other desiredproperty for which it is possible to establish an appropriate biologicalassay. It is thus feasible to carry out, even in the simple case ofβ-gal function visualized on an X-gal Petri plate, a screening of on theorder of 100 million, or even a billion novel genes for a catalyticactivity or any other desired property.

[0102] Selection of Transformed Host Cells.

[0103] On the other hand, it is possible to use selection techniques forany property, catalytic or otherwise, where the presence or absence ofthe property can be rendered essential for the survival of the hostcells containing the expression vectors which code for the novel genes,or also can be used to select for those viruses coding and expressingthe desired novel gene. As a non-limiting, but concrete example,selection for β-galactosidase function shall be described. Anappropriate clone of Z⁻ EBG⁻ E. coli is not able to grow on lactose asthe sole carbon source. Thus, after carrying out the first stepdescribed above, it is possible to culture a very large number of hostcells transformed by the expression vectors coding for the novel genes,under selective conditions, either by progressive diminution of othersources of carbon, or utilization of lactose alone from the start.During the course of such selection, in vivo mutagenesis byrecombination, or by explicitly recovering the expression vectors andmutagenizing their novel genes in vitro by various mutagens, or by anyother common technique, permits adaptive improvements in the capacity tofulfill the desired catalytic function. When both selection techniquesand convenient bioassay techniques exist at the same time, as in thepresent case, it is possible to use selection techniques initially toenrich the representation of host bacteria expressing the β-galfunction, then carry out a screening on a Petri plate on X-gal medium toestablish efficiently which are the positive cells. In the absence ofconvenient bioassays, application of progressively stricter selection isthe easiest route to purify one or a small number of distinct host cellswhose expression vectors code for the proteins catalyzing the desiredreaction.

[0104] It is possible to utilize these techniques to find novel proteinshaving a large variety of structural and functional characteristicsbeyond the capacity to catalyse a specific reaction. For example, it ispossible to carry out a screen or select for novel proteins which bindto cis-regulatory sites on the DNA and thereby block the expression ofone of the host cell's functions, or block transcription of the DNA,stimulate transcription, etc.

[0105] For example, in the case of E. coli, a clone mutant in therepressor of the lactose operon (i−) expresses β-gal functionconstitutively due to the fact the lactose operator is not repressed.All cells of this type produce blue clones on Petri plates containingX-gal medium. It is possible to transform such host cells withexpression vectors synthesizing novel proteins and carry out a screen onX-gal Petri plates in order to detect those clones which are not blue.Among those, some represent the case where the new protein binds to thelactose operator and represses the synthesis of β-gal. It is thenfeasible to mass isolate such plasmids, retransform, isolate thoseclones which do not produce β-gal, and thereafter carry out a detailedverification.

[0106] As mentioned above, the process can be utilized in order tocreate, then isolate, not only exploitable proteins, but also RNA andDNA as products in themselves, having exploitable properties. Thisresults from the fact that, on one hand, the procedure consists increating stochastic sequences of DNA which may interact directly withother cellular or biochemical constituents, and on the other hand, thesesequences cloned in expression vectors are transcribed into RNA whichare themselves capable of multiple biochemical interactions.

[0107] An Example of the Use of the Procedure to Create and Select for aDNA Which is Useful in Itself.

[0108] This example illustrates selection for a useful DNA, and thepurification and study of the mechanism of action of regulatory proteinswhich bind to the DNA. Consider a preparation of the oestradiolreceptor, a protein obtained by standard techniques. In the presence ofoestradiol, a steroid sexual hormone, the receptor changes conformationand binds tightly to certain specific sequences in the genomic DNA, thusaffecting the transcription of genes implicated in sexualdifferentiation and the control of fertility. By incubating a mixturecontaining oestradiol, its receptor, and a large number of differentstochastic DNA sequences inserted in their vectors, followed byfiltration of the mixture across a nitrocellulose membrane, one has adirect selection for those stochastic DNA sequences binding to theoestrodiol-receptor complex, where only those DNAs bound to a proteinare retained by the membrane. After washing and elution, the DNAliberated from the membrane is utilized as such to transform bacteria.After culture of the transformed bacteria, the vectors which theycontain are again purified and one or several cycles of incubation,filtration, and transformation are carried out as described above. Theseprocedures allow the isolation of stochastic sequences of DNA having anelevated affinity for the oestradiol-receptor complex. Such sequencesare open to numerous diagnostic and pharmacologic applications, inparticular, for developing synthetic oestrogens for the control offertility and treatment of sterility.

[0109] Creation and Selection of an RNA Useful in Itself

[0110] Let there be a large number of stochastic DNA sequences, producedas has been described and cloned in an expression vector. It followsthat the RNA transcribed from these sequences in the transformed hostcells can be useful products themselves. As a non-limiting example, itis possible to select a stochastic gene coding for a suppressor transferRNA (tRNA) by the following procedure: a large number (≧10⁸) ofstochastic sequences are transformed into competent bacterial hostscarrying a “nonsense” mutation in the arg E. gene. These transformedbacteria are plated on minimal medium without arginine and with theselective antibiotic for that plasmid (ampicillin if the vector ispUC8). Only those transformed bacteria which have become capable ofsynthesizing arginine will be able to grow. This phenotype can resulteither from a back mutation of the host genome, or the presence in thecell of a suppressor. It is easy to test each transformed colony todetermine if the arg+ phenotype is or is not due to the presence of thestochastic gene in its vector; it suffices to purify the plasmid fromthis colony and verify that it confers an arg+ phenotype on all arg E.cells transformed by it.

[0111] Selection of Proteins Capable of Catalyzing a Sequence ofReactions

[0112] Below we describe another means of selection, open to independentapplications, based on the principle of simultaneous and parallelselection of a certain number of novel proteins capable of catalyzing aconnected sequence of reactions.

[0113] The basic idea of this method is the following: given an initialensemble of chemical compounds considered as building blocks or elementsof construction from which it is hoped to synthesize one or severaldesired chemical compounds by means of a catalyzed sequence of chemicalreactions, there exists a very large number of reaction routes which canbe partially or completely substituted for one another, which are allthermodynamically possible, and which lead from the set of buildingblocks to the desired target compound(s). Efficient synthesis of atarget compound is favored if each step of at least one reaction pathwayleading from the building block compounds to the target compound iscomprised of reactions each of which is catalyzed. On the other hand, itis relatively less important to determine which among the manyindependent or partially independent reaction pathways is catalyzed. Inthe previous description, we have shown how it is possible to obtain avery large number of host cells each of which expresses a distinct novelprotein.

[0114] Each of these novel proteins is a candidate to catalyse one oranother of the possible reactions, in the set of all the possiblereactions leading from the ensemble of building blocks to the targetcompound. If a sufficiently large number of stochastic proteins ispresent in a reaction mixture containing the building block compounds,such that a sufficiently large number of the possible reactions arecatalyzed, there is a high probability that one connected sequence ofreactions leading from the set of building block compounds to the targetcompound will be catalyzed by a subset of the novel proteins. It isclear that this procedure can be extended to the catalysis not only ofone, but of several target compounds simultaneously.

[0115] Based on this principle it is possible to proceed as follows inorder to select in parallel a set of novel proteins catalyzing a desiredsequence of chemical reactions:

[0116] 1. Specify the desired set of compounds constituting the buildingblocks, utilizing preferentially a reasonably large number of distinctchemical species in order to increase the number of potential concurrentreactions leading to the desired target compound.

[0117] 2. Using an appropriate volume of reaction medium, add a verylarge number of novel stochastic proteins isolated from transformed ortransfected cells synthesizing these proteins. Carry out an assay todetermine if the target compound is formed. If it is, confirm that thisformation requires the presence of the mixture of novel proteins. If so,then the mixture should contain a subset of proteins catalyzing one orseveral reaction pathways leading from the building block set to thetarget compound. Purify and divide the initial ensemble of clones whichsynthesize the set of novel stochastic proteins, the subset which isrequired to catalyse the sequence of reactions leading to the targetcompound.

[0118] More precisely, as a non-limiting example, below we describeselection of novel proteins capable of catalyzing the synthesis of aspecific small peptide, in particular, a pentapeptide, starting from abuilding block set constituted of smaller peptides and amino acids. Allpeptides are constituted by a linear sequence of 20 different types ofamino acids, oriented from the amino to the carboxy terminus. Anypeptide can be formed in a single step by the terminal condensation oftwo smaller peptides (or of two amino acids), or by hydrolysis of alarger peptide. A peptide with M residues can thus be formed by M−1condensation reactions. The number of reactions, R, by which a set ofpeptides having length 1, 2, 3, . . . M residues can be interconvertedis larger than the number of possible molecular species, T. This can beexpressed as R/T=M−2. Thus, starting from a given ensemble of peptides,a very large number of independent or partially independent reactionpathways leads to the synthesis of a specific target peptide. Choose apentapeptide whose presence can be determined conveniently by somecommon assay technique for example HPLC (liquid phase high pressurechromatography), paper chromatography, etc. Formation of a peptide bondrequires energy in a dilute aqueous medium, but if the peptidesparticipating in the condensation reactions are adequately concentrated,formation of peptide bonds is thermodynamically favored over hydrolysisand occurs efficiently in the presence of an appropriate enzymaticcatalyst, for example pepsin or trypsin, without requiring the presenceof ATP or other high energy compounds. Such a reaction mixture of smallpeptides whose amino acids are marked radioactively to act as tracerswith ³H, ¹⁴C, ³⁵S, constituting the building block set can be used atsufficiently high concentrations to lead to condensation reactions.

[0119] For example, it is feasible to proceed as follows: 15 mg of eachamino acid and small peptides having 2 to 4 amino acids, chosen toconstitute the building block set, are dissolved in a volume of 0.25 mlto 11.0 ml of a 0.1M pH 7.6 phosphate buffer. A large number of novelproteins, generated and isolated as described above are purified fromtheir bacterial or other host cells. The mixture of these novel proteinsis dissolved to a final concentration on the order of 0.8 to 1.0 mg/mlin the same buffer. 0.25 ml to 0.5 ml of the protein mixture is added tothe mixture of building blocks. This is incubated at 25° C. to 40° C.for 1 to 40 hours. Aliquots of 8 μl are removed at regular intervals,the first is used as a “blank” and taken before addition of the mixtureof novel proteins. These aliquots are analyzed by chromatography usingn-butanol-acetic acid-pyridine-water (930:6:20:24 by volume) as thesolvent. The chromatogram is dried and analyzed by ninhydrin orautoradiography (with or without intensifying screens). Because thecompound constituting the building block set are radioactively marked,the target compound will be radioactive and it will have specificactivity high enough to permit detection at the level of 1-10 ng. Inplace of standard chromatographic analysis, it is possible to use HPLC(high pressure liquid chromatography) which is faster and simpler tocarry out. More generally, all the usual analytic procedures can beemployed. Consequently it is possible to detect a yield of the targetcompound of less than one part per million by weight compared to thecompounds used as initial building blocks.

[0120] If the pentapeptide is formed in the conditions described above,but not when an extract is utilized which is derived from host cellstransformed by an expression vector containing no stochastic genes, theformation of the pentapeptide is not the result of bacterialcontaminants and thus requires the presence of a subset of the novelproteins in the reaction mixture.

[0121] The following step consists in the separation of the particularsubset of cells which contain expression vectors with the novel proteinscatalyzing the sequence of reactions leading to the target pentapeptide.As an example, if the number of reactions forming this sequence is 5,there are about 5 novel proteins which catalyse the necessary reactions.If the clone bank of bacteria containing the expression vectors whichcode for the novel genes has a number of distinct novel genes which ison the order of 1,000,000 all these expression vectors are isolated enmasse and retransformed into 100 distinct sets of 10⁸ bacteria at aratio of vectors to bacteria which is sufficiently low that, on average,the number of bacteria in each set which are transformed is about halfthe number of initial genes, i.e. about 500,000. Thus, the probabilitythat any given one of the 100 sets of bacteria contains the entire setof 5 critical novel proteins is (½)⁵={fraction (1/32)}. Among the 100initial sets of bacteria, about 3 will contain the 5 criticaltransformants. In each of these sets, the total number of new genes isonly 500,000 rather than 1,000,000. By successive repetitions, the totalnumber of which is about 20 in the present case, this procedure isolatesthe 5 critical novel genes. Following this, mutagenesis and selection onthis set of 5 stochastic genes allows improvement of the necessarycatalytic functions. In a case where it is necessary to catalyse asequence of 20 reactions and 20 genes coding novel proteins need to beisolated in parallel, it suffices to adjust the multiplicity oftransformation such that each set of 10⁸ bacteria receives 80% of the106 stochastic genes, and to use 200 such sets of bacteria. Theprobability that all 20 novel proteins are found in one set is0.8²⁰≅0.015. Thus, about 2 among the 200 sets will have the 20 novelgenes which are needed to catalyse the formation of the target compound.The number of cycles required for isolation of the 20 novel genes is onthe order of 30.

[0122] The principles and procedures described above generalize from thecase of peptides to numerous areas of chemistry in which chemicalreactions take place in aqueous medium, in temperature, pH, andconcentration conditions which permit general enzymatic function. Ineach case it is necessary to make use of an assay method to detect theformation of the desired target compound(s). It is also necessary tochoose a sufficiently large number of building block compounds toaugment the number of reaction sequences which lead to the targetcompound.

[0123] The concrete example which was given for the synthesis of atarget pentapeptide can also be generalized as follows:

[0124] The procedure as described, generates among other products,stochastic peptides and proteins. These peptides or proteins can act,catalytically or in other ways, on other compounds. They can equallyconstitute the substrates on which they act. Thus, it is possible toselect (or screen) for the capacity of such stochastic peptides orproteins to interact among themselves and thereby modify theconformation, the structure or the function of some among them.Similarly, it is possible to select (or screen) for the capacity ofthese peptides and proteins to catalyse among themselves, hydrolysis,condensation, transpeptidation or other reactions modifying thepeptides. For example, the hydrolysis of a given stochastic peptide butat least one member of the set of stochastic peptides and proteins canbe followed and measured by radioactive marking of the given proteinfollowed by an incubation with a mixture of the stochastic proteins inthe presence of ions such as Mg, Ca, Zn, Fe and ATP or GTP. Theappearance of radioactive fragments of the marked protein is thenmeasured as described. The stochastic protein(s) which catalyse thisreaction can again be isolated, along with the gene(s) producing them,by sequential diminution of the library of transformed clones, asdescribed above.

[0125] An extension of the procedure consists in the selection of anensemble of stochastic peptides and polypeptides capable of catalyzing aset of reactions leading from the initial building blocks (amino acidsand small peptides) to some of the peptides or polypeptides of the set.It is therefore also possible to select an ensemble capable ofcatalyzing its own synthesis; such a reflexively autocatalytic set canbe established in a chemostat where the products of the reactions areconstantly diluted, but where the concentration of the building blocksis maintained constant. Alternatively, synthesis of such a set is aidedby enclosing the complex set of peptides in liposomes by standardtechniques. In a hypertonic aqueous environment surrounding suchlipsomes, condensation reactions forming larger peptides lowers theosmotic pressure inside the lipsomes, drives water molecules produced bythe condensation reactions out of the lipsomes, hence favoring synthesisof larger polymers. Existence of such an autocatalytic ensemble can beverified by two dimensional gel electrophoresis and by HPLC, showing thesynthesis of a stable distribution of peptides and polypeptides. Theappropriate reaction volume depends on the number of molecular speciesused, and the concentrations necessary to favor the formation of peptidebonds over their hydrolysis. The distribution of molecular species of anautocatalytic ensemble is free to vary or change due to the emergence ofvariant autocatalytic ensembles. The peptides and polypeptides whichconstitute an autocatalytic set may have certain elements in common withthe large initial ensemble (constituted of coded peptides andpolypeptides as given by our procedure) but can also contain peptidesand polypeptides which are not coded by the ensemble of stochastic genescoding for the initial ensemble.

[0126] The set of stochastic genes whose products are necessary toestablish such an autocatalytic set can be isolated as has beendescribed, by sequential diminution of the library of transformedclones. In addition, an autocatalytic set can contain coded peptidesinitially coded by the stochastic genes and synthesized continuously inthe autocatalytic set. To isolate this coded subset of peptides andproteins, the autocatalytic set can be used to obtain, throughimmunization in an animal, polyclonal sera recognizing a very largenumber of the constituents of the autocatalytic set.

[0127] These sera can be utilized to screen the library of stochasticgenes to find those genes expressing proteins able to combine with theantibodies present in the sera.

[0128] This set of stochastic genes expresses a large number of codedstochastic proteins which persist in the autocatalytic set. Theremainder of the coded constituents of such an autocatalytic set can beisolated by serial diminution of the library of stochastic genes, fromwhich the subset detected by immunological methods has first beensubtracted.

[0129] Such autocatalytic sets of peptides and proteins, obtained asnoted, may find a number of practical applications.

We claim:
 1. A process for the production of a peptide, polypeptide, orprotein having a predetermined property, comprising the steps of:producing by synthetic polynucleotide coupling, stochastically generatedpolynucleotide sequences; forming a library of expression vectorscontaining such stochastically generated polynucleotide sequences;culturing host cells containing the vectors to produce peptides,polypeptides, or proteins encoded by the stochastically generatedpolynucleotide sequences; carrying out screening or selection on suchhost cells, to identify a peptide, polypeptide, or protein produced bythe host cells having the predetermined property; isolating astochastically generated polynucleotide sequence which encodes theidentified peptide, polypeptide, or protein; using the isolated sequenceto produce the peptide, polypeptide, or protein having the predeterminedproperty.
 2. A process for the production of a peptide, polypeptide, orprotein having a predetermined property, comprising the steps of:producing at least partially stochastic synthetic polynucleotidesequences, introducing the at least partially stochastic polynucleotidesequences thus obtained into host cells, cultivating the transformedhost cells containing these at least partially stochastic polynucleotidesequences so as to clone the stochastic polynucleotide sequences andlead to the production of peptides, polypeptides, or proteins expressedby at least some of these stochastic polynucleotide sequences, carryingout screening and/or selection methods on such clones of transformedhost cells to identify those clones producing the peptide, polypeptide,or protein having the predetermined property, isolating the clones soidentified, and growing the isolated clones in a manner so as to producethe peptide, polypeptide, or protein having the predetermined property.3. The process according to claim 2 wherein the at least partiallystochastic polynucleotide sequences are introduced into host cells by anexpression vector.
 4. A process for the production of a peptide,polypeptide, or protein having a predetermined property bymicrobiological means, comprising the steps of: simultaneously producingin a common milieu expression vectors which include at least onestochastic sequence of polynucleotides, introducing the expressionvectors thus obtained into host cells, cultivating the host cellscontaining these expression vectors so as to clone the stochasticsequences and lead to the production of peptides, polypeptides, orproteins expressed by at least some of these stochastic sequences,carrying out screening and/or selection methods on such clones toidentify those clones producing the peptide, polypeptide, or proteinhaving the predetermined property, isolating the clones so identified,and growing the isolated clones in a manner so as to produce thepeptide, polypeptide, or protein having the predetermined property.
 5. Aprocess for the production of a peptide, polypeptide, or protein havinga predetermined property by microbiological means, comprising the stepsof: simultaneously producing in a common milieu expression vectors whichinclude at least one stochastic sequence of polynucleotides, introducingthe expression vectors thus obtained into host cells, cultivating thehost cells containing these expression vectors so as to clone theexpression vector and lead to the production of peptides, polypeptides,or proteins expressed by at least some of these expression vectors,carrying out screening and/or selection methods on such clones producingthe peptide, polypeptide, or protein having the predetermined property,isolating the clones so identified, and growing the isolated clones in amanner so as to produce the peptide, polypeptide, or protein having thepredetermined property.
 6. A process for the production of a peptide,polypeptide, or protein having a predetermined property, comprising thesteps of: producing a library of at least partially stochastic syntheticpolynucleotide sequences, amplifying and translating these at leastpartially stochastic polynucleotide sequences to produce peptides,polypeptides, or proteins expressed by at least some of these stochasticpolynucleotide sequences, and carrying out screening and/or selectionmethods to obtain the peptide, polypeptide, or protein having at leastthe predetermined property.
 7. The process according to claim 6 whereinthe at least partially stochastic polynucleotide sequences undergoextracellular amplification.
 8. The process according to claim 1 whereinpolynucleotide sequences are produced by stochastic copolymerization ofdesired ratios of the deoxyphosphonucleotides guanine, cytosine,thymidine, and adenine, starting from the two extremities of anexpression vector which was previously linearized, then by formation ofcohesive extremities to create a first strand of stochastic DNAconstituting a molecule of expression vector possessing two stochasticsequences whose 3′ extremities are complementary, followed by synthesisof the second strand of the stochastic DNA.
 9. The process according toclaim 1 wherein the polynucleotide sequences are produced by stochasticcopolymerization of double stranded oligonucleotides which do not havecohesive ends, in a manner so as to form fragments of stochastic DNA,followed by ligation of these fragments in an expression vector whichwas previously linearized.
 10. The process according to claim 1 whereinthe host cells are prokaryotic cells.
 11. The process according to claim10 wherein the prokaryotic cells are HB 101 or C
 600. 12. The processaccording to claim 1 wherein the host cells are eukaryotic cells. 13.The process according to claim 1 wherein the expression vector is aplasmid.
 14. The process according to claim 13 wherein the plasmid ispUC8.
 15. The process according to claim 1 wherein the expression vectoris viral DNA.
 16. The process according to claim 1 wherein theexpression vector is a hybrid of plasmid and viral DNA.
 17. The processaccording to claim 1 wherein the expression vector is a phage.
 18. Theprocess according to claim 1 wherein during selection, the DNA whichencodes for the peptide, polypeptide, or protein is mutagenized.
 19. Theprocess according to claim 18 wherein the DNA undergoes in vivomutagenesis by recombination.
 20. The process according to claim 18wherein the DNA undergoes in vitro mutagenesis by mutagens.
 21. Theprocess according to claim 18 wherein the predetermined property is anadaptive improvement in the capacity to fulfill a desired function. 22.The process according to claim 1 wherein the predetermined property isthe capacity to catalyse a given chemical reaction.
 23. The processaccording to claim 1 wherein the predetermined property is the capacityto simulate at least one biologically active compound.
 24. The processaccording to claim 23 wherein the biologically active compound is amember of the group consisting of hormones, neurotransmitters, growthfactors, and specific regulators of replication, transcription, ortranslation of nucleic acids.
 25. The process according to claim 1wherein the predetermined property is the capacity to modify at leastone biological function of at least one biologically active compound.26. The process according to claim 25 wherein the biologically activecompound is a member of the group consisting of hormones,neurotransmitters, growth factors, and specific regulators ofreplication, transcription, or translation of nucleic acids.
 27. Theprocess according to claim 1 wherein the predetermined property is thatof having at least one epitope similar to one of the epitopes of a givenantigen.
 28. The process according to claim 27 wherein the antigen isepidermal growth factor.
 29. The process according to claim 27 whereinthe said property is the capacity to simulate an epitope of a givenantigen, and that screening and/or selection of clones producing apeptide, polypeptide, or protein having this property is carried out byobtaining molecules which bind to that epitope, utilizing thesemolecules so obtained to identify those clones containing the peptide,polypeptide, or protein, growing the clones thus identified, separatingand purifying the peptide, polypeptide, or protein produced by theclones, and then submitting these peptide(s), polypeptide(s), orprotein(s) to an in vitro assay to verify that it has the said property.30. The process according to claim 27 wherein the peptide, polypeptide,or protein having said property is used to make a vaccine.
 31. Theprocess according to claim 27 wherein an ensemble of peptides,polypeptides, or proteins having said property is used to make avaccine.
 32. The process according to claim 1 wherein the clonesproducing a peptide, polypeptide, or protein having the predeterminedproperty are identified and isolated by affinity chromatography onantibodies corresponding to a protein expressed by a natural fragment ofDNA.
 33. The process according to claim 32 wherein after expression andpurification of the identified peptide, polypeptide, or protein, novelfragments of the peptide, polypeptide, or protein having thepredetermined property are separated and isolated.
 34. The processaccording to claim 32 wherein the natural fragment of DNA contains agene expressing B-galactosidase, and that the peptide, polypeptide, orprotein is identified and isolated by affinity chromatography withanti-β-galactosidase antibodies.
 35. The process according to claim 34wherein after expression and purification of the identified peptide,polypeptide, or protein, novel fragments of the peptide, polypeptide, orprotein having the predetermined property are separated and isolated.36. The process according to claim 1 wherein the predetermined propertyis the capacity to bind to a given compound.
 37. The process accordingto claim 36 wherein the said given compound is a member of the groupconsisting of peptides, polypeptides, and proteins.
 38. The processaccording to claim 36 wherein the said given compound is a member of thegroup consisting of sequences of DNA and RNA.
 39. The process accordingto claim 38 wherein the sequence of DNA acts as a cis-regulatorysequence of replication or transcription of a neighboring sequence ofDNA.
 40. The process according to claim 1 wherein the number ofdifferent peptides, polypeptides, or proteins expressed by all of the atleast partially stochastic polynucleotide sequences is greater than10,000.
 41. The process according to claim 2 wherein the number ofdifferent peptides, polypeptides, or proteins expressed by all of the atleast partially stochastic polynucleotide sequences is greater than10,000.
 42. The process according to claim 6 wherein the number ofdifferent peptides, polypeptides, or proteins expressed by all of the atleast partially stochastic polynucleotide sequences is greater than10,000.
 43. The process according to claim 1 wherein DNA having thepredetermined property is isolated from the clones.
 44. The processaccording to claim 1 wherein RNA having the predetermined property isisolated from the clones.
 45. A process for the production of anexpression vector which includes at least one stochastic sequence ofpolynucleotides, comprising the steps of: providing, in a common milieu,at least three oligonucleotides, said oligonucleotides comprising atleast 7 nucleotides, polymerizing said nucleotides to form stochasticdouble stranded DNA fragments, and ligating said stochastic doublestranded DNA fragments into an expression vector.
 46. The processaccording to claim 45 wherein the oligonucleotides are heptamers. 47.The process according to claim 46 wherein the transforming DNA derivedfrom a culture of clones is isolated and purified, the purified DNA iscut by means of at least one restriction enzyme which corresponds to aspecific restriction site present in the palindromic heptamers oroctamers but absent from the expression vector, then the ensemble oflinearized stochastic DNA fragments obtained are simultaneously treatedwith T4 ligase in a manner to create a new ensemble of DNA containingnew stochastic sequences, and this new ensemble of DNA is used totransform host cells.
 48. The process according to claim 46 wherein theheptamers are palindromic.
 49. The process according to claim 48 whereinthe palindromic heptamers are selected from the group consisting of:5′ XTCGCGA 3′ 5′ XCTGCAG 3′ 5′ RGGTACC 3′

where X=A, G, C, or T, and R=A or T.
 50. The process according to claim46 wherein the oligonucleotides are octamers.
 51. The process accordingto claim 50 wherein the transforming DNA derived from a culture ofclones is isolated and purified, the purified DNA is cut by means of atleast one restriction enzyme which corresponds to a specific restrictionsite present in the palindromic heptamers or octamers but absent fromthe expression vector, then the ensemble of linearized stochastic DNAfragments obtained are simultaneously treated with T4 ligase in a mannerto create a new ensemble of DNA containing new stochastic sequences, andthis new ensemble of DNA is used to transform host cells.
 52. Theprocess according to claim 50 wherein the octamers are palindromic. 53.The process according to claim 52 wherein the palindromic octamers areselected from the group consisting of: 5′ GGAATTCC 3′ 5′ GGTCGACC 3′5′ CAAGCTTG 3′ 5′ CCATATGG 3′ 5′ CATCGATG 3′


54. The process according to claim 45 wherein the stochastic doublestranded DNA fragments are about 160 to 800 base pairs in length.
 55. Aprocess for the production of an expression vector which includes atleast one stochastic sequence of polynucleotides, comprising the stepsof: linearizing an expression vector, reacting said linearized vectorwith terminal transferase in the presence of desired ratios ofdeoxynucleotide-triphosphates of guanine, cytosine, thymidine, andadenine to form polymers, hybridizing said polymers, and treating saidpolymers to form expression vectors consisting of stochastic doublestranded DNA.
 56. A process for the production of an expression vectorcomprising the steps of: producing a library expression vectors whichinclude at least one stochastic synthetic sequence of polynucleotides,introducing the expression vectors thus obtained into host cells,cultivating the host cells containing these expression vectors so as toclone the expression vector and lead to the production of peptides,polypeptides, or proteins expressed by at least some of these expressionvectors, carrying out screening and/or selection methods on such clonesproducing the peptide, polypeptide, or protein having the predeterminedproperty, and isolating the cloned expression vectors so identified. 57.A process for the production of a peptide, polypeptide, or proteinhaving a predetermined property, comprising the steps of: producing bysynthetic polynucleotide coupling, stochastically generatedpolynucleotide sequences; forming a library of expression vectorscontaining such stochastically generated polynucleotide sequences;culturing host cells containing the vectors to produce peptides,polypeptides, or proteins encoded by the stochastically generatedpolynucleotide sequences; carrying out screening or selection on suchhost cells, to identify a peptide, polypeptide, or protein produced bythe host cells having the predetermined property.
 58. A process for theproduction of a peptide, polypeptide, or protein having a predeterminedproperty, comprising the steps of: producing at least partiallystochastic polynucleotide sequences, introducing the at least partiallystochastic polynucleotide sequences thus obtained into host cells,cultivating the transformed host cells containing these at leastpartially stochastic polynucleotide sequences so as to clone thestochastic polynucleotide sequences and lead to the production ofpeptides, polypeptides, or proteins expressed by at least some of thesestochastic polynucleotide sequences, carrying out screening and/orselection methods on such clones of transformed host cells to identifythe peptide, polypeptide, or protein having the predetermined property.59. A process for the production of a host cell containing an expressionvector which includes at least one stochastic sequence ofpolynucleotides, comprising the steps of: producing a library ofexpression vectors which include at least one stochastic sequence ofpolynucleotides, said expression vectors being produced by the followingsteps: providing, in a common milieu, at least three oligonucleotides,said oligonucleotides comprising at least 7 nucleotides; polymerizingsaid nucleotides to form stochastic double stranded DNA fragments; andligating said stochastic double stranded DNA fragments into anexpression vector; introducing the expression vectors thus obtained intohost cells; and cultivating the host cells containing these expressionvectors so as to clone the stochastic sequences and lead to theproduction of peptides, polypeptides, or proteins expressed by at leastsome of these stochastic sequences.
 60. A process for the production ofa host cell containing an expression vector which includes at least onestochastic sequence of polynucleotides, comprising the steps of:producing a library of expression vectors which include at least onestochastic sequence of polynucleotides, said expression vectors beingproduced by the following steps: linearizing an expression vector,reacting said linearized vector with terminal transferase in thepresence of desired ratios of deoxynucleotide-triphosphates of guanine,cytosine, thymidine, and adenine to form polymers, hybridizing saidpolymers, and treating said polymers to form expression vectorsconsisting of stochastic double stranded DNA; introducing the expressionvectors thus obtained into host cells; and cultivating the host cellscontaining these expression vectors so as to clone the stochasticsequences and lead to the production of peptides, polypeptides, orproteins expressed by at least some of these stochastic sequences.
 61. Aprocess for the production of a host cell containing an expressionvector which includes at least one stochastic sequence ofpolynucleotides, comprising the steps of: producing a library ofexpression vectors which include at least one stochastic syntheticsequence of polynucleotides, introducing the expression vectors thusobtained into host cells, cultivating the host cells containing theseexpression vectors so as to clone the stochastic sequences and lead tothe production of peptides, polypeptides, or proteins expressed by atleast some of these stochastic sequences, carrying out screening and/orselection methods on such clones to identify those clones producing thepeptide, polypeptide, or protein having the predetermined property, andisolating the cloned host cells so identified.
 62. A library ofexpression vectors comprising stochastic polynucleotide sequences whichencode at least 10,000 peptides, polypeptides, or proteins.
 63. Anexpression vector produced in accordance with the process of claim 45,55, or
 56. 64. A host cell produced in accordance with the process ofclaim 59, 60, or
 61. 65. The peptide, polypeptide, or protein producedin accordance with claim 1, 2, 4, 5, 6, 57, or
 58. 66. A process for theproduction of an ensemble of peptides, polypeptides, or proteins, whichensemble has the capacity to catalyze a sequence of chemical reactionsto thereby produce a desired compound, comprising the steps of: a.producing a library of at least partially stochastic polynucleotidesequences, b. introducing the at least partially stochasticpolynucleotide sequences thus obtained into host cells, c.simultaneously cultivating the independent transformed host cellscontaining these at least partially stochastic polynucleotide sequencesso as to clone the stochastic polynucleotide sequences and lead to theproduction of a first set of peptides, polypeptides, or proteinsexpressed by at least some of these stochastic polynucleotide sequences,d. combining the first set of peptides, polypeptides, or proteins with acollection of precursors to said desired compound under conditionsfavorable for said sequence of chemical reactions; e. determining ifsaid desired compound is produced, and if so, dividing the first set ofpeptides, polypeptides, or proteins into a plurality of subsets; f.combining each of the subsets with the collection of precursors underconditions favorable for said sequence of chemical reactions; g.determining, subset by subset, if said desired compound is produced witheach of said subsets, and if so, dividing that subset into furthersubsets; and h. repeating steps f. and g. until the ensemble ofpeptides, polypeptides, or proteins which catalyze the sequence ofchemical reactions to produce the desired compound is identified.